The Construction of a Tagged Danish Corpus
نویسندگان
چکیده
T h e o b jec t o f th is p ap e r is to p resen t ongo ing w o rk o n th e co n stru c tio n o f a m o rp hosyn tac tica lly tag g ed D an ish co rpus, w h ich is an in tegral s tep in th e m ak in g o f a C o n stra in t G ram m ar (C G ) p a rse r fo r D an ish an d a lso co n stitu te s a p a rt o f th e D an ish co n trib u tio n to th e E u ro p ean P A R O L E pro ject. T h is p ap er d iscu sses v arious aspects o f the m o rp h o lo g ica l d esc rip tio n o f D an ish u se d h ere as w ell as so m e o f th e gu ide lines develo p ed fo r the m an u a l d isam b ig u a tio n p rocess. F ina lly , i t a lso b rie fly g ives an o v erv iew o f th e o b jec tives o f th e tw o p ro jec ts invo lved .
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملPAYMA: A Tagged Corpus of Persian Named Entities
The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...
متن کاملFrom Treebank to Propbank: A Semantic-Role and VerbNet Corpus for Danish
This paper presents the first version of a Danish Propbank/VerbNet corpus, annotated at both the morphosyntactic, dependency and semantic levels. Both verbal and nominal predications were tagged with frames consisting of a VerbNet class and semantic role-labeled arguments and satellites. As a second semantic annotation layer, the corpus was tagged with both a noun ontology and NER classes. Draw...
متن کاملThe Construction of a Chinese Named Entity Tagged Corpus: CNEC1.0
In order to build an automatic named entity recognition (NER) system for machine learning, a large tagged corpus is necessary. This paper describes the manual construction of a Chinese named entity tagged corpus (CNEC 1.0) that can be used to improve NER performance. In this project, we define five named entity tags: PER (person name), LOC (location name), ORG (organization name), LAO (location...
متن کاملComma checking in Danish
This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect commas in Danish. Trained on a part-of-speech tagged corpus of 600,000 words, the system identifies incorrect commas with a precision of 91% and a recall of 77%. The system was developed by randomly inserting commas in a text, which were tagged as incorrect, while the original commas were tagged...
متن کاملCultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis
This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...
متن کامل